0%

(ECCV 2016) Stacked hourglass networks for human pose estimation

Posted on 2018-01-03 In Paper Note , Basic Tasks Views:

Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 483-499.

1. Overview

论文提出一种用于single person pose estimation的

repeated
bottom-up, top-down (Hourglass)
intermediate supervision
模型结构（Stacked Hourglass Networks），能够captures and consolidates information across all scales of the image.

论文在

FLIC
MPII
数据集上是进行实验

2. 模型结构

2.1. Hourglass

图中每个box都是一个residual结构. 在top-down过程中只使用upsampling，不适用deconv.

Bottom-up. Conv + ReLU + BN + Max pooling
Top-Down. Upsamping + Add

2.2. Residual Block & Full Network

网络输入256x256，输出64x64
整个网络最开始使用一个Conv(7x7, 2s)
所有residual block输出通道数为256
网络中的Hourglass结构不共享参数
intermediate supervision使用相同gt

3. Experiments

3.1. 数据处理

对于多人情况. MPII训练集和测试集都提供了target person的中心点、scale (相对于200pixel的倍数)，可根据这些信息crop person, resize到256x256，再训练. 另外，可移动target person中心点到图像中心
Data augmentation. 旋转(±30°), 缩放(.75-1.25), 不使用平移

3.2. 训练&测试

使用MSE计算loss
测试时，使用origin image和flip image的平均结果作为最终预测
评价标准. FLIC：normalized by torso size, MPII： normalized by head size

3.3. 实验结果

3.4. Ablation

3.5. Multiple People

3.6. Occlusion